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Abstract 

The forecast of river stream flows is of significant importance for the 
development of early warning systems. Artificial intelligence algorithms 
have proven to be an effective tool in hydrological modeling data-driven, 
since they allow establishing relationships between input and output data 
of a watershed and thus make decisions data-driven. This article 
investigates the applicability of the k-nearest neighbor (KNN) algorithm 
for forecasting the mean daily flows of the Ramis river, at the Ramis 
hydrometric station. As input to the KNN machine learning algorithm, we 
used a data set of mean basin precipitation and mean daily flow from 
hydrometeorological stations with various lags. The performance of the 
KNN algorithm was quantitatively evaluated with hydrological ability 
metrics such as mean absolute percentage error (MAPE), anomaly 
correlation coefficient (ACC), Nash-Sutcliffe efficiency (NSE), Kling-Gupta 
efficiency (KGE') and the spectral angle (SA). The results for forecasting 
the flows of the Ramis river with the k-nearest neighbor machine learning 
algorithm reached high levels of reliability with flow lags of one and two 
days and precipitation with three days. The algorithm used is simple but 
robust to make short-term flow forecasts and can be integrated as an 
alternative to strengthen the daily hydrological forecast of the Ramis 


river. 
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Resumen 

El pronóstico de caudales de un río es de gran importancia para el 
desarrollo de sistemas de alerta temprana. Los algoritmos de inteligencia 
artificial han demostrado ser una herramienta eficaz en la modelación 
hidrológica basado en datos, pues permiten establecer relaciones entre 
los datos de entrada y salida de una cuenca hidrográfica, y de esta manera 
tomar decisiones basado en datos. Este artículo investiga la aplicabilidad 
del algoritmo k vecino más cercano (KNN) para el pronóstico de caudales 
medios diarios del río Ramis en la estación hidrométrica Ramis. Como 
insumo de entrada al algoritmo de aprendizaje automático KNN utilizamos 
un conjunto de datos de precipitación media de la cuenca y caudal medio 
diario de estaciones hidrometeorológicas con varios rezagos. El 
rendimiento del algoritmo KNN se evaluó cuantitativamente con métricas 
de habilidad hidrológica, como el error porcentual absoluto medio (MAPE), 
anomalía del coeficiente de correlación (ACC), eficiencia de Nash-Sutcliffe 
(NSEB), eficiencia de Kling-Gupta (KGE') y ángulo espectral (SA). Los 
resultados para realizar pronóstico de caudales del río Ramis con el 
algoritmo de aprendizaje automático KNN alcanzaron altos niveles de 
confiabilidad, sobre todo con rezagos de caudales de uno y dos días, y 
precipitación con tres días. El algoritmo utilizado es simple, pero robusto 


para efectuar pronósticos de caudales a corto plazo, y puede ser integrado 


171 
Tecnología y ciencias del agua, ISSN 2007-2422, 
14(2), 169-203. DOI: 10.24850/j-tyca-14-02-05 


2023, Instituto Mexicano de Tecnología 
del Agua. Open Access bajo la licencia CC BY-NC-SA 4.0 
(https: //creativecommons.org/licenses/by-nc-sa/4.0/) 


o) 0) Check for updates 
OPEN ACCESS 
Tecnología y 


CienciaszAgua 


como una alternativa para el fortalecimiento del pronóstico hidrológico 
diario del río Ramis. 
Palabras clave: inteligencia artificial, modelado hidrológico basado en 


datos, aprendizaje automático, río Ramis, k-vecino más cercano. 
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Introduction 


Floods induced by excessive rainfall and overflowing rivers, are common 
natural hazards in the regions of Peru. The frequencies with which they 
occur in flood periods, causes significant losses and damage to property. 
This phenomenon is likely to become more prevalent with climate change, 
and accurate and reliable forecasts of river flows would help minimize the 
damage associated with flooding. Accurate short-term forecasts (hourly 
and daily) are important for predicting floods and developing early 
warning systems (Mundher, Ahmed, € Abdulmohsin, 2015). An accurate 
stream flow forecasting is critical to optimal flood control (Solomatine 8 
Xue, 2004). 
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The use of process-based models has become essential tools to 
study the response of hydrological regimes (Madsen, 2000; Mendez 8 
Calvo-Valverde, 2016), but a sufficiently representative and precise 
implementation can lead to invest a lot of time and cost, in addition to 
calibrating a large number of parameters. Since the 1930s, numerous 
rainfall-runoff models have been developed and the entire physical 
process of the hydrological cycle were formulated mathematically in 
conceptual models that compose a large number of parameters (Tokar 8 
Johnson, 1999). In a data-driven hydrological modeling context, “All 


p7 


models are wrong and some are useful”, this quote is significant due to 
the presence of different unresolved queries and deliberate assumptions 
(Remesan €: Mathew, 2015). 

Data-driven models, especially machine learning (ML) techniques, 
do not require complex physical equations and assumed parameters that 
process-based models require. Due to the  simplicity in their 
implementation of ML algorithms and more accurate prediction, it has 
been widely applied in hydrological modeling/forecasting (Mundher et al., 
2016; Remesan € Mathew, 2015; Solomatine € Xue, 2004), achieving 
good performances even with small data sets (Veintimilla-Reyes, 
Cisneros, € Vanegas, 2016). Data-driven models or ML models are 
capable of forecasting rainfall-runoff, even for a fairly complex system 
(Solomatine 8 Xue, 2004). 

ML is considered a subfield of artificial intelligence (AI) and is 


divided into three main classes, supervised learning, unsupervised 


173 
Tecnología y ciencias del agua, ISSN 2007-2422, 
14(2), 169-203. DOI: 10.24850/j-tyca-14-02-05 


del Agua. Open Access bajo la licencia CC BY-NC-SA 4.0 
(https: //creativecommons.org/licenses/by-nc-sa/4.0/) 


o) 0) Check for updates 
OPEN ACCESS 
Tecnología y 


CienciaszAgua 


learning, and reinforcement learning (Igual 8 Seguí, 2017). A review of 
Al techniques, specifically ML supervised algorithms, have successfully 
demonstrated their applicability in urban flow prediction (Xie et a/., 2020), 
flood prediction (Mosavi, Ozturk, € Chau, 2018; Solomatine 8, Xue, 2004), 
forecast of daily flows (Mundher et a/., 2015), modeling and forecast of 
mean monthly flows (Laqui, 2010; Lujano, Lujano, Quispe, € Lujano, 
2014; Mundher et al., 2016), in the same way, they have also been 
applied in the wind energy forecast based on daily wind speed data 
(Demolli, Dokuz, Ecemis, €  Gokcek, 2019), estimation of 
evapotranspiration (Granata, 2019; Xu et a/., 2018), hydro-climatological 
predictions (Thakur, Kalra, Ahmad, € Lamb, 2020), landslide modeling 
(Liu et al., 2021), reference evapotranspiration estimation (Alipour, 
Yarahmadi, € Mahdavi, 2014; Antonopoulos 8 Antonopoulos, 2017; 
Mehdizadeh, 2018), in flood risk assessment modeling (Wang et al., 
2015), as well as to model susceptibility to rain-induced landslides (Dou 
et al., 2019), rainfall-runoff modeling (Tokar € Johnson, 1999), 
configuration of rating curve relationships (Jain 8 Chalisgaonkar, 2000). 

In reference to KNN, applied to water resources variables, we found 
investigations applied to the precipitation forecast (Huang, Lin, Huang, 4 
Xing, 2017), multi-model ensemble predictions of precipitation and 
temperature (Ahmed et a/., 2020), for real-time flood forecasting (Liu et 
al., 2020), with weather generating model (Sharif 8, Burn, 2007), as well 
as for wind energy prediction (Yesilbudak, Sagiroglu, 8 Colak, 2017) and 
data completion (Kowarik €: Templ, 2016). 
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Since ML algorithms are a promising approach, this paper aimed to 
evaluate the k-nearest neighbor machine learning algorithms for 
forecasting the stream flows of the Ramis river, based on known data 
from the hydrological system (stream flows and precipitation), in order to 
contribute to the development of early warning systems and 


strengthening of the hydrological forecast. 


Materials and methods 


Study area 


The area in which this study was carried out is the Ramis river basin (14 
769.62 km), which extends from the Ramis hydrometric station to the 
eastern mountain range in the department of Puno, Peru (Figure 1) and 
is the hydrographic unit with the highest contribution of flows to the 
highest navigable lake in the world (Titicaca). The altitude of the basin is 
between 3 812 and 5 749 meters above sea level, with an average slope 


of 22 % and a length of the main river of approximately 321 km. 
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According to the climatic classification of Peru (SENAMHI, 2020), the basin 
under study has a predominantly rainy type of climate, with dry autumn 
and winter. The multi-year average precipitation of the basin is 700.1 mm 
(Fernández, 2017), with higher accumulated precipitation in summer 
(December-February), with a dry autumn and winter that make the 
difference to the dry season. The type of land cover, according to the 
annual classification of the international geosphere-biosphere program 
(IGBP) available in google earth engine (GEE), image collection ID 
MODIS/006/MCD120Q1 (Friedl £ Sulla-Menashe, 2015), has 0.01 % tree 
cover, 96.86 % grasslands dominated by herbaceous plants (< 2 m), 0.03 
% permanent wetlands, 1.68 % farmland, 0.13 % urban and urbanized 
lands, 0.01 % permanent snow and ice, 1.25 % areas arid and 0.02 % of 


water bodies. 
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Figure 1. Location of the study area. 


The daily time series of total precipitation in millimeters (mm) and 


mean flows in cubic meters per second (m3/s) were obtained from the 
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National Meteorology and Hydrology Service of Peru (SENAMHI) and the 
period of time used extends from September 1, 2005, to August 31, 2016. 
Figure 1 shows the location of the study area and the spatial distribution 


of 14 meteorological stations and one hydrometric station. 


K-nearest neighbor (KNN) 


The KNN algorithm is one of the simplest algorithms in the ML field, the 
idea is to memorize the training data set, and then make predictions of 
any new data taking as reference the data of its closest neighbors in the 
training set (Shalev-Shwartz, Science, Ben-David, € Science, 2013). 
Furthermore, KNN is a non-parametric method that can be used as a 
classifier (Gupta 8 Mittal, 2018) and a regressor (Hossny, Magdi, Soliman, 
8: Hossny, 2020). The algorithm does not assume any type of equation or 


functional relationship between the input and the output (Joshi, 2020): 


dE Jet) (1) 


where $ is the output value, y; the ¡th nearest neighbor and k is the 


number of nearest neighbors. 
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Development the hydrological data-driven modeling 


A previous and significant step in ML algorithms is the selection of the 
most important characteristics (variables) in order to obtain a more 
effective predictive model and avoid characteristics that do not contribute 
to the training of the model, and in this way reduce the time of training, 
reduce the complexity of the model, and reduce overfitting. Excessive use 
a large number of features in the model input leads to a perfect fit and 
causes the model to memorize the training data set and therefore lose 
generalization and obtains poor results in the validation stage (Remesan 
si: Mathew, 2015). 

There are several ways to measure the importance of characteristics 
(Pedregosa, Weiss, 8. Brucher, 2011; Remesan €: Mathew, 2015), but we 
focus on the Pearson correlation coefficient (Tokar 8: Johnson, 1999) and 
the algorithm of importance of the permutation characteristic (Pedregosa 
et al., 2011). 

The best way to incorporate input characteristics in a data-driven 
model is to take into account the lags of the data series (Mundher et al., 
2016; Solomatine € Xue, 2004; Tokar € Johnson, 1999), it is like this, 


that the procedure is based on developing hydrological forecasting models 


179 
Tecnología y ciencias del agua, ISSN 2007-2422, 
14(2), 169-203. DOI: 10.24850/j-tyca-14-02-05 


del Agua. Open Access bajo la licencia CC BY-NC-SA 4.0 
(https: //creativecommons.org/licenses/by-nc-sa/4.0/) 


o) 0) Check for updates 
OPEN ACCESS 
Tecnología y 


CienciaszAgua 


that will use memory, that is, using retrospective flow and precipitation 
values to forecast the Q, flow of the Ramis river. 

So, in the first instance, the inputs have been selected based on a 
cross-correlation analysis between the input data set (precipitation and 
flow) with various lags and the output flows Q, (Remesan 8 Mathew, 
2015; Solomatine € Xue, 2004; Tokar 8 Johnson, 1999). Although the 
Pearson correlation technique is a suitable technique in linear systems, 
and the flow precipitation process is non-linear (Remesan € Mathew, 
2015), its use is common and popular to select the appropriate inputs 
(Huang € Foo, 2002), since its foundation is to determine the strength of 
the relationship between the input time series and the output time series 
with various lags (Haugh 8. Box, 1977). 

Consequently, to infer which characteristics have the greatest 
impact on flow forecasting, we use the permutation importance algorithm, 
implemented with the KNN predictive model with possible predictors and 
the predicted characteristic. The permutation importance algorithm is 
especially useful for nonlinear estimators and can be calculated in the 
training set or in the extended test or validation set when the data is 
tabular and a model score drop is indicative of how much the model 
depends on the characteristic (Pedregosa et al., 2011). 


The importance i, is calculated with: 


; 1 
=35= -2k=1 Sk,j (2) 


180 
Tecnología y ciencias del agua, ISSN 2007-2422, 
14(2), 169-203. DOI: 10.24850/j-tyca-14-02-05 


del Agua. Open Access bajo la licencia CC BY-NC-SA 4.0 
(https: //creativecommons.org/licenses/by-nc-sa/4.0/) 


o) 0) Check for updates 
OPEN ACCESS 
Tecnología y 


CienciaszAgua 


The permutation importance algorithm requires as input the 
adjusted predictive model, data set (training or validation). The reference 
score s of the fitted model is calculated with the data set (to verify the 
performance of the model, use precision in a classifier or R? for a 
regressor). For each characteristic ¡ (column of the data set), for each 
repetition k in 1,..., K randomly shuffle column j of the data set to generate 
a new version of the data set and calculate the s, , score of the model 


fitted with the new version of the data set and keep K cases. 


Hydrological data driven modelling 


The training data must be large enough to contain the characteristics of 
the basin, on the contrary, an insufficient data set would not allow the 
model to generalize the patterns in physical phenomena (Tokar 8 
Johnson, 1999). 

The flow precipitation process was modeled using the KNN algorithm 
representing the current flow of the river Q,, depending on the most 
important characteristics. The selection of the training data set 
(calibration) was considered 70 % (2808) of the total data (4012), while 
the remaining 30 % (1204) was considered for the test stage (validation). 


Rusli, Yudianto and Liu (2015) indicated that the calibration stage is 
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carried out to understand the correlation that exists between the model 
parameters and the hydrological response of the basin and also to achieve 
the best agreement between the observed and simulated flows. To obtain 
the best hydrological forecast model (Equation (1)), the configurations 
with the most important characteristics (use of different lags / variables 
to model Q,) determined by means of the correlation matrix and the 


permutation importance algorithm were trained and tested. 


Goodness of fit metrics 


The effectiveness of the models was evaluated using five different 
goodness of fit metrics (Table 1), mean absolute percentage error 
(MAPE), anomaly correlation coefficient (ACC), Nash-Sutcliffe efficiency 
(NSEB), the efficiency of KGE' (Kling, Fuchs, € Paulin, 2012) and spectral 
angle (SA). 
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Table 1. Goodness of fit metrics 


Optimal 
value 


Mean absolute 
percentage 
error (MAPE) 
Anomaly 
correlation ac = Li (Si— 5) (0; = 0) 
coefficient di 9005 
(ACC) 
Nash-Sutcliffe 
Efficiency NSE =1-— 
(NSE) 


Kling-Gupta K6E'"=J=1*+18=1)++ (y 1" 
efficiency lis CVs S/us 
(KGE') 7 uo "E 09 70), 


¡m1 (S; — 01? 
at —0)S 


Spectral angle (S, 0) ) 
SA = arcos | —— 
(SA) 15112110112 


Variables: S is the simulated value; S is the mean of the simulated values; O is the 


observed value; O is the mean of the observed value; o is the standard deviation in 
m3/s; r is the correlation coefficient between the simulated and observed value 
(dimensionless); £ is the bias ratio (dimensionless); y is the variability ratio 
(dimensionless); u is the mean value in m?/s; CV is the coefficient of variation 
(dimensionless), and the subscripts s and o represent observed and simulated values 


respectively. 
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MAPE calculates the mean absolute percentage error, and its range 
is 0 % < MAPE < inf, where 0 % indicates a lower percentage error and 
on the contrary indicates a higher percentage error in the data. Also, ACC 
is a common measure in the verification of spatial fields and measures 
the correlation between the variation pattern of the simulated values 
compared to those observed, the range varies -1 < ACC < 1, where -1 
indicates a negative correlation, O indicates complete randomness, while 
1 indicates a perfect correlation of the pattern of variation of the 
anomalies. NSE is a metric that uses the mean value as a benchmark and 
the range can vary from -inf < NSE <1, as the value approaches unity the 
better. On the other hand, KGE' (Kling et a/., 2012) is the modified version 
of KGE (Gupta, Kling, Yilmaz, € Martinez, 2009) proposed to avoid cross- 
correlation between biases and variability relationships, the range can 
vary -inf < KGE' <1, values close to unity do not indicate bias. SA is an 
attractive measure to be used in the coincidence of spectra, ¡it measures 
the angle between the two vectors in hyperspace and indicates how well 
the shape of the simulated and the observed series matches (not the 
magnitude), its range -n/2 < SA <n/2 (n = pi), where values close to zero 
are better. 

For this process, we use the Python Jupyter Notebook IDE and the 
hydrostats package that contains the metrics to characterize the errors 
between the simulated and observed time series (Roberts, Williams, 
Jackson, Nelson, € Ames, 2018). 
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Results and discussion 


We find that the most important characteristic for the forecast of the 
stream flow Q,, is the stream flow with a lag Q,_, with a correlation 
coefficient equal to 0.99. As we widen the lag in Q¿_», Q:-3, Qi-ar Q+-s and 
Q;-6, the correlation coefficient decreases to 0.96, 0.94, 0.91, 0.90 and 
0.88 respectively (Figure 2). If we analyze the relationship between 
precipitation and flow, the flow series Q, and the precipitation series with 
lag P.-4, have a higher correlation (r = 0.54), with respect to P;_;,, Pi-», 
Pe-3, Ps and P,_¿ with correlation coefficients of 0.39, 0.45, 0.51, 0.53 
and 0.52, respectively. If a Q, forecast model were developed based on 
Qi-or Qeoar Qiar Qp-s and Q¿-¿ considering that they have higher values 
correlation, we would obtain a complex model in which characteristics that 


do not contribute to the training of the model would be included. 
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Figure 2. Cross-correlations of Q, with precipitation and flow lags 


Then, for a definitive selection of the important characteristics for 
the model, the correlation analysis has been complemented by applying 


the algorithm of importance of the permutation characteristic (Figure 3) 
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and it is confirmed that the flow with lag Q,_, has a higher score (0.779) 
and is the most relevant characteristic to forecast Q,, followed by Q,-_, 
(0.163) and the characteristics with less importance Q;,_3 (0.046), Q._4 
(0.039), Q:-s (0.033) and Q,_¿ (0.037). Likewise, for precipitation with lag 
P+-3, had a higher score (0.001) with respect to P;,, Pi-;, Pe-21 Peras Pes and 
P,-_¿ with the lowest importance score (< 0.001). 

Therefore, the model for the flow forecast was defined by a 
combination with the most important characteristics of precipitation and 
flow according to the results of the algorithm of importance of the 
permutation characteristic: 1) 0Q,=f(Q¿_1) 2) Q::f(Qí-1 Qro), 3) 
Q:: F(Q+-1,P¿-3) and 4) Q;: F(Q;-1, Q+-2, Pr-3), where Q, is the flow to forecast, 
Q¿-, and Q,-_, are the flows lagged by 1 and 2 days, while P,_¿ is the mean 
precipitation of the basin with a lag of 3 days. Although P,_z is a less 
important characteristic with respect to Q¿_, and Q;,_,, we consider the 
model as an input variable, since Pedregosa et al. (2011) indicate that 
the characteristics that are considered of low importance could be very 
important for a good model and could increase the performance of the 
model. 
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Figure 3. Importance scores of the permutation characteristic. 


The results of the goodness of fit metrics for the KNN model, show 
the effectiveness for forecasting the flows of the Ramis river at the Ramis 
hydrometric station (Table 2). The predictive capacities lead to very high 
values of NSE (NSE = 0.96) in the validation stage, in particular 
Q.: f(Qí-1,P,-3) is characterized by high values of ACC = 0.989, NSE = 
0.979, KGE' = 0.988, lower error values (MAPE = 6.070 %) and a better 


match in the shape of the simulated and observed series (SA = 0.113). 
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Q.: f(Q,-1) is the least effective model showing values of ACC = 0.98, NSE 
= 0.965, KGE' = 0.982, error values (MAPE = 7.403 %) and a lower 
coincidence in the form of the simulated and observed series (SA = 
0.145). Also Q.:f(Qí-1,Qr-o2 Pi-3) shows better performance than 
Q:: F(Q+-1,Q+-2) and Q,:f(Q¿-1,), characterized by an MAPE = 6.230 %, ACC 
= 0.987, NSE = 0.975, KGE = 0.988 and SA = 0.122. For its part, 
Q.: F(Q+-1,Q+-2), is identified by presenting better performance with respect 
to Q,:f(Q;¿-1), with values of ACC = 0.985, NSE = 0.972, KGE' = 0.985, 
error values (MAPE = 6.546 %) and a similar coincidence in the shape of 
the simulated and observed series (SA = 0.129). It should be noted that 
with a flow lagged by one day and precipitation lagged three days, the 
flow forecast model show a better performance compared to the models. 
This addition of P,_¿ to the model is corroborated by Pedregosa et al. 
(2011) that although it is a characteristic that is considered of low 
importance, it produces better results and increases the performance of 


the model. 


Table 2. Performance results of the KNN algorithm-validation stage. 
Model MAPE (%) 


Qr: F(Q;-1) 0.982 0.965 0.982 0.145 
EF OPS 6.070 0.989 0.979 0.988 0.113 


DEF: Dios Pica) 6.230 0.987 0.988 0.122 
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To continue with a further evaluation of the flow forecasting models, 
we present a series of scatter diagrams (Figure 4), in which an almost 
perfect fit between the observed and predicted values with the 45% line, 
especially for Q;.: f(Q¿-1,P,-3) (Figure 4c), followed by the other models 
(Figure 4a, 4b and 4d). We can deduce that the data set chosen to train 
the KNN model, have the same statistical properties and therefore the 
estimated parameters do not significantly affect the forecast of Q, in the 
validation period, thus, the MAPE values, ACC, NSE, KGE' and SA are 
similar in the training and validation period. A significant difference in the 
evaluation criteria of the goodness of fit in the training and validation set 
could correspond if the model is trained using a set of data that deviates 
greatly from the mean situation and significantly affects the forecast in 
the period of test (Antonopoulos 8 Antonopoulos, 2017). A set of training, 
validation, and test data with the same statistical properties, helps to 


develop the best possible model (Maier, Jain, Dandy, 8 Sudheer, 2010). 
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Figure 4. Scatter plot of simulated and observed flows. 
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Figure 5 illustrates the observed and simulated flow pattern with 
the KNN algorithm within the validation period. As observed, the data- 
driven forecast with KNN were able to closely match the actual values. 
The KNN algorithm is an effective tool for forecasting the daily flows of 
the Ramis river and has the advantage of directly providing Q, based on 
past data, thus reducing investment in time and cost, which are required 
to implement hydrological models based on physically/conceptual 


hydrological processes. 
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Figure 5. Hydrograph of time series observed and simulated with KNN 


algorithms-validation stage. 
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Conclusions 


This study focused on hydrological modeling through the use of the KNN 
algorithm, exploring its applicability for forecasting mean daily flows of 
the Ramis river. The most important characteristics were selected in the 
first instance using Pearson correlation coefficient and supplemented by 
the importance of permutation characteristic algorithm. We found that 
Q;¿-, is the most relevant characteristic for the Q, flow forecast of the 
Ramis river at the Ramis hydrometric station, however, when we consider 
Q¿-, and P,_3 as input the model, the precision of KNN increases. 

The research shows that the KNN algorithm would be a suitable 
approach for flow forecasting and can be integrated as an alternative for 
the strengthening of the daily hydrological forecast and implementation 
of an early warning system. 
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